GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition
نویسندگان
چکیده
In human-computer interaction, Speech Emotion Recognition (SER) plays an essential role in understanding the user's intent and improving interactive experience. While similar sentimental speeches own diverse speaker characteristics but share common antecedents consequences, challenge for SER is how to produce robust discriminative representations through causality between speech emotions. this paper, we propose a Gated Multi-scale Temporal Convolutional Network (GM-TCNet) construct novel emotional representation learning component with multi-scale receptive field. GM-TCNet deploys capture dynamics of emotion across time domain, constructed dilated causal convolution layer gating mechanism. Besides, it utilizes skip connection fusing high-level features from different gated blocks abundant subtle changes human speech. first uses single type feature, mel-frequency cepstral coefficients, as inputs then passes them temporal convolutional module generate features. Finally, are fed classifier accomplish task. The experimental results show that our model maintains highest performance most cases compared state-of-the-art techniques.
منابع مشابه
Spectro-temporal Modulations for Robust Speech Emotion Recognition Spectro-temporal Modulations for Robust Speech Emotion Recognition
متن کامل
Speech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملEmotion recognition of conversational affective speech using temporal course modeling
In a natural conversation, a complete emotional expression is typically composed of a complex temporal course representing temporal phases of onset, apex, and offset. In this study, subemotional states are defined to model the temporal course of an emotional expression in natural conversation. Hidden Markov Models (HMMs) are adopted to characterize the subemotional states; each represents one t...
متن کاملModeling Perceivers Neural-Responses Using Lobe-Dependent Convolutional Neural Network to Improve Speech Emotion Recognition
Developing automatic emotion recognition by modeling expressive behaviors is becoming crucial in enabling the next generation design of human-machine interface. Also, with the availability of functional magnetic resonance imaging (fMRI), researchers have also conducted studies into quantitative understanding of vocal emotion perception mechanism. In this work, our aim is two folds: 1) investiga...
متن کاملEmotion recognition using imperfect speech recognition
This paper investigates the use of speech-to-text methods for assigning an emotion class to a given speech utterance. Previous work shows that an emotion extracted from text can convey complementary evidence to the information extracted by classifiers based on spectral, or other non-linguistic features. As speech-to-text usually presents significantly more computational effort, in this study we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Speech Communication
سال: 2022
ISSN: ['1872-7182', '0167-6393']
DOI: https://doi.org/10.1016/j.specom.2022.07.005